-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[MXNET-779]Add DLPack Transformation API #12047
Conversation
…to DLPack-convertor-API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good change, I will make a few additional comments below
python/mxnet/_ctypes/ndarray.py
Outdated
@@ -31,10 +31,10 @@ | |||
|
|||
class NDArrayBase(object): | |||
"""Base data structure for ndarray""" | |||
__slots__ = ["handle", "writable"] | |||
__slots__ = ["handle", "writable", "dlpack_handle"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do not put dlpack_handle inside it, instead directly return a pycapsule that contains the handle
python/mxnet/cython/ndarray.pyx
Outdated
|
||
def __dealloc__(self): | ||
CALL(MXNDArrayFree(self.chandle)) | ||
if self.cdlpack_handle: | ||
CALL(MXNDArrayCallDLPackDeleter(self.cdlpack_handle)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do not put dlpack_handle inside the object
src/ndarray/ndarray.cc
Outdated
struct NDArrayDLManager { | ||
NDArray handle; // ref NDArray | ||
DLManagedTensor tensor; | ||
TShape strides; // store variable strides |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DLPack allow strides field to be nullptr, for mxnet do do not need to store strides, as we only support compact tensor for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During conversion to mxnet tensor, check the strides field and see if it is compact
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will fix it.
It seems that PyTorch.utils.from_dlpack doesn't allow strides nullptr.
src/ndarray/ndarray.cc
Outdated
if (!is_none()) { | ||
dlmanager->tensor.dl_tensor = data().dltensor(); | ||
// assign value for dl_tensor.strides | ||
if (!dlmanager->tensor.dl_tensor.strides) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
strides == nullptr
see my additional comments in #11964 |
@tqchen Thank you! |
include/mxnet/c_api.h
Outdated
* \param out_dlpack pointer holder to get pointer of DLManagedTensor | ||
* \return 0 when success, -1 when failure happens | ||
*/ | ||
MXNET_DLL int MXNDArrayToDLPackForRead(NDArrayHandle handle, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From API point of view, we can just expose ToDLPack, and in the python API, explicitly call wait_for_read and wait_for_write
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will modify it.
include/mxnet/c_api.h
Outdated
* \param dlpack_capsule the pointer of a PyCapsule storing DLManagedTensor | ||
* \return 0 when success, -1 when failure happens | ||
*/ | ||
MXNET_DLL void MXNDArrayCallDLPackCapsuleDeleter(PyObjectHandle dlpack_capsule); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am less certain why we need the deleter function here, can they be directly handled in the python/cython side?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to implement a deleter function in python, however the deleter function may be released by Python GC before calling the deleter function. See the test Code. It will raise segmentation fault.
The Python Frontend of MXNet both uses Python(ctypes) and Cython. It may be impossible to implement the deleter function in ctypes.
So the deleter function should be implemented in C++.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Take a look at destructor at apache/tvm#1573
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is some subtlty here but they can never-the-less be implemented
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ever tried to write a python function as the destructor, but it can't pass CI.
Please see the MXNet CI result
All windows test_dlpack failed because the destructor written in Python is released before calling it.
PyTorch implemented the destructor using Python API in C++, and CuPy implemented it by cython, namely the code will be built by C++.
However, MXNet uses ctypes and cython. I couldn't find a better way to implement the destructor except writing it in MXNet C++ API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I knew the trick and tried it in my previous PR. But it failed in Windows Test.
Related CI
It seems that the CI of TVM doesn't have Windows Test so the CI is passed.
The reason is that the destructor will be released by Python GC before calling it.
And the GC release order are different between Linux and Windows.
In Linux, the destructor is called first, then the destructor is released. So it works.
However, In Windows, the destructor is released first before calling it, it doesn't work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is strange as destructor itself sits in the global scope and should be destructed after the dltensors(which have a local scope)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
In the test code, it works in Linux but failed in Windows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see two problems in your particular gist you paste.
- The destructor need to be declared in the global scope(instead of constructing when passing to the argument)
- THe cstring need to outlive the capsule(construct a global string)
- The function need to outlive the capsule(constructor c func and put it under global scope/module)
cfunc = ctypes.CFUNCTYPE(None, ctypes.c_void_p)
def dfunc(dltensor):
pycaps = ctypes.cast(dltensor, ctypes.py_object)
pass
c_destructor = cfunc(dfunc)
c_str_dltensor = ctypes.c_char_p(b"dltensor")
def test():
a = ctypes.pythonapi.PyCapsule_New(1, c_str_dltensor, c_destructor)
test()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
I found it works on Windows and Linux.
I have updated the PR.
include/mxnet/tensor_blob.h
Outdated
* \brief constructor that construct TBlob from DLTensor | ||
* \param DLTensor Object | ||
*/ | ||
explicit TBlob(const DLTensor &dltensor) : dptr_(dltensor.data), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to add compactness check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specifically, TBlob only support compact tensors, need to check strides == null or the strides reflect a compact setting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I will move the strides check from ndarray.cpp to tensor_blob.h.
python/mxnet/_ctypes/ndarray.py
Outdated
# pylint: disable= no-member | ||
|
||
def __init__(self, handle, writable=True): | ||
def __init__(self, handle, writable=True, dlpack=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dlpack should not be part of the member, the PyCapsule manages itself
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dlpack in NDArray is PyCapsule which is the return value of to_dlpack_for_read/write
.
When calling FromDLPack
function, NDArray need to manage the release of NDArrayDLManager::handle
in ndarray.cc
.
e.g.
a = mx.nd.array([1,2,3]) # Denote TBlob x to store the data
pack = a.to_dlpack_for_write()
b = mx.nd.from_dlpack(pack)
del a, pack
# a and PyCapsule pack has been released.
# b need to manage the release TBlob x.
NDArray doesn't have the deleter function, so I made dlpack as a member of NDArray.
When the reference count of dlpack in NDArray is zero, the TBlob will be released.
Is there any other way to keep the reference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A better way is to keep NDArray's shared_ptr inside the manager_ctx itself, you can take a look at TVM's NDArray to DLManagedTesnor impl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NDArray in MXNet and TVM are different. NDArray in TVM has the function IncRef
and DecRef
to change the reference count, however that in MXNet uses NDArray::ptr_
(std::shared_ptr) to manage the reference count. NDArray::ptr_
is a private member of NDArray
.
The PR is similar to the PyTorch DLPack implementation Code
I add a NDArrayDLManager
to manage the reference count of NDArray. Code line 315 in src/ndarray/ndarray.cc
Setting dlpack
as the NDArray(Python Class) member is to avoid the release of the store of data, e.g. When the original NDArray and the PyCapsule (DLPack) are release, the new NDArray (generated by from_dlpack) still exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can create a new NDArray() that copies the original NDArray(which increases refcount) and put that as a context
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In your case, when a get deleted, b still holds a NDArrayDLManager, which is allocated by new, and that object still hold NDArray(which holds a shared_ptr), so the original resource won't be released
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to be careful to use shape from the same NDArray in your NDArrayDLManager
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
But I think the new NDArray object b
couldn't hold a shared_ptr
as the same as the original NDArray a
. The new NDArray b
only get the data pointer
rather than shared_ptr
from DLPack.
In the other case,
from torch.utils import dlpack
a = torch.array([1,2,3])
pack = dlpack.to_dlpack(a)
b = mx.nd.from_dlpack(pack)
del a, pack
When dlpack.to_dlpack
is called, PyTorch will allocate ATenDLMTensor
which increases the refcount of Torch Tensor code.
After the variables a
and pack
are released, ATenDLMTensor
still exists.
I think the deleter should be called by the new NDArray b
when the NDArray b
releases. Refer to PyTorch FromDLPack. However, NDArray doesn't have explicit deleter parameter.
In my PR, from_dlpack
will copy the dlpack object.
When the old dlpack pack
releases, it doesn't call the deleter.
The new dlpack b.dlpack
will be a member of the new NDArray b
as NDArray(handle=handle, dlpack=dlpack_copy)
.
When the new NDArray b
releases, the new dlpack b.dlpack
will be released, then call the deleter by the new dlpack b.dlpack
. And the deleter will release NDArrayDLManager
or ATenDLMTensor
. The refcount of the old NDArray a
will decrease 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you copy the NDArray, they hold the same shared_ptr to the data, note that shared_ptr can be copied, and its ref counter is automatically managed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a copy of NDArray as the member of NDArrayDLManager, and the copy increase the refcount.
I'm confused how to modify the PR.
After creating a new NDArray(Python) from DLPack, then delete the old NDArray(Python) and PyCapsule(DLPack).
Which object will call the deleter function?
In my case, when a
gets deleted, how does b
hold the NDArrayDLManager? It seems that b
only get the pure data pointer from DLManagedTensor::dl_tensor::data
, the type of the pointer is not shared pointer.
And how does b
store the pointer to NDArrayDLManager
in MXNet NDArray?
@tqchen
Since it is a C API change @piiswrong can you also take a look? |
python/mxnet/ndarray/ndarray.py
Outdated
check_call(_LIB.MXNDArrayWaitToWrite(data.handle)) | ||
dlpack = DLPackHandle() | ||
check_call(_LIB.MXNDArrayToDLPack(data.handle, ctypes.byref(dlpack))) | ||
return ctypes.pythonapi.PyCapsule_New(dlpack, b'dltensor', pycapsule_dlpack_deleter) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cannot directly pass in string, instead use a global string object, PyCapsule requires the string to outlive the capsule
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solved it. Thank you!
src/c_api/c_api.cc
Outdated
|
||
|
||
typedef struct { | ||
char py_object[16]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be dangerous unless pycapsule is already standardized and won't change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's dangerous.
I want to call the function PyCapsule_GetPointer
in c_api.cc,
however MXNet doesn't include Python.h header file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed it in the latest PR.
@tqchen TVM src/runtime/ndarray.cc#L148 NDArray NDArray::FromDLPack(DLManagedTensor* tensor) {
NDArray::Container* data = new NDArray::Container();
data->deleter = Internal::DLPackDeleter;
data->manager_ctx = tensor;
data->dl_tensor = tensor->dl_tensor;
return NDArray(data);
} PyTorch aten/src/ATen/DLConvertor.cpp#L170 Tensor fromDLPack(const DLManagedTensor* src) {
Backend backend = getATenBackend(src->dl_tensor.ctx);
ScalarType stype = toScalarType(src->dl_tensor.dtype);
auto deleter = [src](void * self) {
src->deleter(const_cast<DLManagedTensor*>(src));
};
return getType(backend, stype).tensorFromBlob(
src->dl_tensor.data,
IntList(src->dl_tensor.shape, src->dl_tensor.ndim),
IntList(src->dl_tensor.strides, src->dl_tensor.ndim),
deleter);
} CuPy cupy/core/dlpack.pyx#L231 mem = DLPackMemory(dltensor)
mem_ptr = memory.MemoryPointer(mem, mem.dlm_tensor.dl_tensor.byte_offset)
cupy_array = ndarray(shape_vec, cp_dtype, mem_ptr) The all NDArray save the deleter of dlpack or DLManagedTensor. So I added a new member |
My point is that the DLManagerd Tensor should manage its own deletion and you do not have to add dlpack field for mx.nd.NDArray. Even when the frontend NDArray release, the internal data will be kept alive until the pycapsule get released and the DLManagedTensor's deleter get called |
@tqchen Hi! I have modified the PR, the DLManaged Tensor will manage itself. import mxnet as mx
pack = mx.nd.array([1,2,3]).to_dlpack_for_write()
b = mx.nd.from_dlpack(pack)
del pack # DLManagedTensor's deleter get called when `pack` get released.
print (b) # b != [1,2,3] Does it need to manage the release of dlpack in NDArray Chunk (C++) ? I add |
pack is only supposed to be consumed once, when from_dlpack is called, the pack itself should be marked as used and its deleter will no longer be called when the capsule get destroyed |
@tqchen Yes. The latest PR has implemented it in ndarray.py. |
@tqchen @piiswrong |
include/mxnet/tensor_blob.h
Outdated
@@ -89,7 +92,8 @@ class TBlob { | |||
template<typename DType> | |||
TBlob(DType *dptr, const TShape &shape, int dev_mask, int dev_id = -1) | |||
: dptr_(dptr), shape_(shape), | |||
type_flag_(mshadow::DataType<DType>::kFlag) { | |||
type_flag_(mshadow::DataType<DType>::kFlag), | |||
deleter_(nullptr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot introduce deleter to the TBlob
include/mxnet/ndarray.h
Outdated
@@ -831,7 +855,7 @@ class NDArray { | |||
|
|||
Chunk(const NDArrayStorageType storage_type_, const TBlob &data, | |||
const std::vector<TBlob> &aux_data, int dev_id) | |||
: static_data(true), delay_alloc(false), storage_type(storage_type_) { | |||
: static_data(true), delay_alloc(false), storage_type(storage_type_), deleter_(nullptr) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot plug deleter to the TBlob and Chunk, this seems will cause more problems than what it can solve.
std::shared_ptr do support customized deleter already, and maybe that can be used to help solve this issue of having to hide a customized deleter for NDArray
@wkcn sorry I was away for a vacation recently, please see my comments |
@tqchen Thank you! I will try to change it. |
@tqchen Hi! I have updated the code as your comments. |
I think it is good to go from my end. thanks @wkcn . @piiswrong please take another look, if there is no issues I will merge this in in two days |
@szha can you also take a round of review and if there is no problem, merge this in? |
@tqchen Thank you so much! |
This is now merged |
Description
Add DLPack Transformation API
DLPack is important for sharing tensors and operators among Deep Learning Frameworks.
Issue: #11964
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
MXNDArrayToDLPack
,MXNDArrayFromDLPack
,MXNDArrayCallDLPackDeleter
andMXNDArrayCallDLPackCapsuleDeleter
inc_api
ToDLPack()
andFromDLPack
for the classNDArray
TBlob(const DLTensor &dltensor)
DLDataTypeTransform
intensor_blob.h
for the transformation fromDLDataType
toMShadow Type Flag
NDArray(const TBlob &data, int dev_id, const std::function<void()>& deleter) function
mx.nd.NDArray.to_dlpack_for_read
,mx.nd.NDArray.to_dlpack_for_write
,mx.nd.to_dlpack_to_read
,mx.nd.to_dlpack_for_write
andmx.nd.from_dlpack
Comments
Extra Test with PyTorch